home *** CD-ROM | disk | FTP | other *** search
- QSORT - Version 3.00
- Copyright (c) 1985, 1986, 1987 - Ben Baker - All rights reserved
-
-
-
- QSORT may be freely copied and distributed. provided that 1) it is
- distributed under the name "QSORT," and 2) this documentation file
- always accompanies it. Vendors wishing to distribute QSORT
- commercially, or with commercial products may contact the author at
- the address below for terms.
-
- QSORT is made available under the "share-ware" concept. If you find
- this program useful, you are asked to send its author a license fee of
- $20 for each machine on which you use it. This will encourage the
- development of other useful, affordable tools. Send checks to:
-
- Ben Baker
- RR #1, Box 637
- E. Alton, Il 62024
-
- Bug reports or suggestions may be sent to the above address or via
- FidoNet mail to Fido node # 100/76, Baker's Acre, or by logging into
- Baker's Acre BBS at 618-251-2169.
-
-
- --- INTRODUCTION ---
-
- QSORT was first designed to be a replacement for, and to overcome the
- limitations of DOS SORT, but has been enhanced a number of times and
- moved to new compilers twice. The current version will sort files
- whose size is limited only by available disk space. File name(s) may
- be given explicitly or QSORT will sort from standard input to standard
- output, and so, may be used in pipes or with redirection. Multiple
- key fields may be specified. Binary files with fixed-length records
- may be sorted, provided only that keys are ASCII fields.
-
- QSORT tries to be very protective of your data. If QSORT has an error
- of any kind, it will terminate with the input file still intact, and
- will return to DOS with ERRORLEVEL = 1. When QSORT successfully
- completes a sort, it terminates with ERRORLEVEL set to 0. (See the
- batch commands in your DOS manual for more information on ERRORLEVEL.)
-
- The command line syntax is a super-set of SORT's, so QSORT may be used
- without other changes in batch files using SORT, but in most cases you
- will probably want to make use of QSORT's greater capabilities.
-
-
- --- The QSORT Command and Options ---
-
- QSORT is invoked with the following command:
-
- QSORT [<in_file> [<out_file>]] [/<key_spec>. . .]
- [/T[<tag>] | /F<len>] [/R] [/S] [/?]
-
- Note that all parameters on the command line are optional.
-
- The <in_file> and <out_file> parameters are "ASCII-Z" file specifiers.
- They may contain disk and path information in the standard DOS format,
- but must not contain "wild-card" characters.
-
- If <in_file> is missing, QSORT sorts from standard input to standard
- output. These are files defined and opened by DOS before QSORT is
- loaded. (See your DOS manual concerning the use of redirection and
- pipes.)
-
- If <in_file> is given but <out_file> is missing, QSORT creates a
- temporary file in the directory containing <in_file> and sorts to the
- temporary file. On successful completion of the sort, <in_file> is
- deleted and the temporary is renamed to <in_file>. The effect is an
- apparent "sort-in-place."
-
- If both file names are given, <in_file> is unchanged and the sorted
- output is written to <out_file>. This feature was added in version
- 3.00 as a convenience. New DOS users sometimes have difficulty with
- the concept of redirection. Note that the following two commands are
- exactly equivalent:
-
- QSORT FILE.TXT FILE.SRT
- QSORT <FILE.TXT >FILE.SRT
-
- In the first, QSORT opens the files. In the second, redirection is
- specified and DOS opens the files. The result is the same. It is an
- error QSORT can't detect if you mix these. For instance:
-
- QSORT FILE.TXT >FILE.SRT
-
- will result in a sort-in-place. QSORT will open FILE.TXT but won't
- know DOS has opened FILE.SRT for it, and will ignore it.
-
- The /<key_spec> Parameter
-
- Up to 30 /<key_spec> parameters may be used to specify key fields and
- are ordered major to minor from left to right. The /<key_spec>
- argument has the form:
-
- /[L][+|-][col][:width]
-
- Note that all elements of this argument are "optional," but at least
- one element must be present following the slant-bar (/). The 'L', if
- present, specifies "lexical" sequence for this field. (See discussion
- of lexical sequence below.) The minus (-) sign specifies reverse order
- for this field. The plus (+) sign (or no sign) specifies normal sort
- order. If present, [col] defines the beginning column of the field.
- If omitted, column 1 is assumed. If present, [:width] defines the
- field width in columns. If width is omitted, the rest of the record
- is assumed to be part of the key. If no key field parameters are
- given, the entire record is the key field. When sorting
- variable-length records, any key field which falls beyond the end of a
- particular record is treated as a null (zero length) field for that
- record, and collates low. When sorting fixed-length records, all
- defined fields must fall within the defined record length.
-
- The /F<len> Parameter
-
- The /F<len> parameter defines the record length for a file of
- fixed-length records. All records in the input file MUST be exactly
- <len> bytes long. The records need not (but may) be terminated with a
- CR/LF sequence. They may contain any data, even binary data, but the
- key fields must be ASCII strings. Strings may be terminated with a
- null (binary zero) character, or may be padded with trailing spaces to
- the full length of the field. Note that QSORT does not attempt to
- support Pascal style strings. These are strings which begin with a
- character whose binary value is a character count. This is followed
- by <count> characters of ASCII data, which in turn is followed by
- random data out to the maximum length of the string. These strings
- may be used as key fields, but the programmer must insure that either
- the last real character is a null character, or the field is padded to
- its full length with spaces. QSORT must be told that the key begins
- in the second character position (the first character of real data).
-
- The /T[<tag>] Parameter
-
- The /T[<tag>] parameter, if present, indicates that the "records" to
- be sorted may be more than a single line long. If <tag> is also
- present, it defines a character to be used to tag the "end-of-record."
- If <tag> is not present, the first empty line terminates the record.
- For this purpose, "empty" means "no characters." A line containing but
- a single space is NOT empty! A line may be "tagged" by placing the
- <tag> character anywhere on the last line of a logical record. The
- entire line, including the tag character will appear as the last line
- of the record.
-
- Note that the /F<len> and /T[<tag>] parameters are incompatible, and
- may not both be specified.
-
- The /R Parameter
-
- The /R parameter is included for compatibility with DOS SORT and is
- redundant. It reverses the sense of sort direction for all key
- fields.
-
- The /S Parameter
-
- The /S parameter tells QSORT to make a statistics report to the screen
- at the end of a run. The report is written to the "standard error"
- device, the console, and may not be redirected. The following is an
- actual statistics report "cut" from the screen after QSORT had sorted
- a 1 mega-byte+ file:
-
- QSORT - Text Sort - Version 3.00
- Copyright (c) 1985, 1986, 1987 - Ben Baker - All rights reserved
-
- 10131 records sorted
- 146 bytes in longest record
-
- 116744 sort phase comparisons
- 53771 merge phase comparisons
-
- 170515 total comparisons
- 16.8 comparisons per input record
-
- 21 temporary merge files created
- 2 merge passes
- 2.3 average passes over data
-
- 6:00 elapsed time
-
- The first two numbers are self-explanatory. The next two are the
- number of times two records were compared during the sort phase and
- the merge phase respectively, followed by the total comparisons.
-
- The next number is total comparisons divided by the number of records
- in the input file. If there is no merge phase, this number is
- typically 10 to 12. If the file is large enough to require merging,
- it is 12 to 20, on average. If it is much larger than 20, it usually
- means that there is something unusual about your input file. It may
- already be sorted, or there may be large blocks of records which
- compare equal. This can happen if you sort on, say column 50 and the
- input file contains a large number of records shorter than 50 bytes.
- In this case, a minor sort key at column 1 may significantly speed
- sorting.
-
- The next two items are self-explanatory. "Average passes over data"
- reflects the number of times each record was read and written. For
- short files not requiring a merge pass, this number will be 1.0. When
- merging is needed, the last merge pass is the one which writes the
- output file and it must read and write every record exactly once.
- Thus when only one merge pass is made, there will be exactly 2.0
- "average passes over data." In the above case the first merge pass
- processed about 30% of the records, hence the value of 2.3.
-
- The above sort was performed on an IBM XT with two hard drives (C and
- D). The input file was on C; the temporary merge files were placed
- on D; and the output file was written to C. The sort of a 1
- mega-byte file took six minutes flat. With my Tiny Turbo accelerator
- turned on, the same sort takes about 3:50.
-
- The /? Parameter
-
- The /? parameter requests help or parameter evaluation. When QSORT
- is executed with the /? parameter alone, it lists a short description
- of the QSORT parameters. If /? is entered as one of several
- parameters, QSORT will produce a short report on the screen describing
- the sort it would perform based on those parameters without actually
- doing a sort.
-
- For example:
-
- QSORT /? /L5:12 /-3:2 /22 /T /R <INFILE.TXT >OUTFILE.TXT
-
- requests a sort on a file of tagged records. It defines three key
- fields, one of them inverted (-). The /R parameter reverses the sense
- of all three fields. Since redirection is specified, QSORT does not
- see or know about the names of the files it will sort. The /?
- parameter requests a report, rather than a sort, and QSORT obligingly
- produces on the screen, the following:
-
- QSORT - Text Sort - Version 3.00
- Copyright (c) 1985, 1986, 1987 - Ben Baker - All rights reserved
-
- With the present arguments, QSORT would sort from STDIN to STDOUT
- Records are multiple lines ending with an empty line
-
- Key fields in descending order of importance are:
- Pos Len Type
-
- 5 12 Lexical Descending
- 3 2 ASCII
- 22 65535 ASCII Descending
-
- The third field has an unspecified length. The value "65535" merely
- means that this field extends to the end of each record.
-
-
- The /M<len> supported in earlier versions of QSORT is no longer
- required, but will be accepted (and ignored) by QSORT. There is no
- "hard-coded" maximum record length in QSORT, but there is a practical
- limit. At some time during every sort, the two longest records in the
- input file must be compared. Therefore, the two longest records must
- be able to fit together in the sort buffer. The sum of their lengths
- cannot exceed about 50K -- not an altogether unreasonable limitation.
- QSORT can be shoe-horned into tighter memory and will run if it can
- find 4K for a sort buffer and 4K for an output buffer, but the two
- longest records must still fit in the sort buffer together.
-
- Arguments may appear in any order on the command line except that
- <in_file> must appear before <out_file>, and /<key_spec> arguments
- must appear in descending order of importance.
-
-
- --- Lexical Sorting ---
-
- The lexical sorting capability was borne out of my own need to sort
- word lists with mixed capitalization. ASCII sequence produced some
- bizarre results when words beginning with 'Z' sorted before those
- beginning with 'a.' Case-insensitive sorting wasn't much better
- because upper and lower case got mixed randomly.
-
- The following table will illustrate what I mean:
-
- INPUT ASCII CASE LEXICAL
- INSENSITIVE
-
- DeLaPort Baker Baker Baker
- Smith Brown brown Brown
- brown DeAngelo bRown bRown
- deLaPorte DeLaPort Brown brown
- Deangelo Deangelo Deangelo DeAngelo
- deAngelo Deangelo deangelo Deangelo
- Brown DelaPort Deangelo Deangelo
- smith DelaPorte deAngelo deAngelo
- delaPorte Harry DeAngelo deangelo
- DelaPort Smith delaPort DeLaPort
- DeAngelo bRown DelaPort DelaPort
- DelaPorte brown delaPort delaPort
- deangelo deAngelo DeLaPort delaPort
- Harry deLaPorte DelaPorte DelaPorte
- delaPort deLaPorte deLaPorte deLaPorte
- Baker deangelo delaPorte deLaPorte
- deLaPorte delaPort deLaPorte delaPorte
- Deangelo delaPort Harry Harry
- bRown delaPorte smith Smith
- delaPort smith Smith smith
-
- The first column is a list of names in arbitrary order. The second is
- an ASCII sort of that list. Third we have one possible
- case-insensitive sort of the list. The fourth column is what I really
- wanted. It is sorted the way these words would be sorted in a
- dictionary (or lexicon). The third and fourth columns both collect
- words of identical spellings together, but in the third column, upper
- and lower case spellings are in arbitrary order, while the fourth
- column places upper case spellings ahead of lower case spellings.
-
- For example, the two occurrences of Smith are widely separated in
- column 2 because one is capitalized and the other is not. Column 3
- brings the two together, but in the wrong order. They might have been
- in the right order, but the order is strictly arbitrary. In column 4,
- Smith comes before smith, and lexical sorting will always put them in
- this order.
-
- Lexical sorting is achieved by making case-insensitive comparisions of
- entire fields. If the fields compare equal, an ASCII comparison is
- made to arbitrate ties. In other words, when "lexical" fields in two
- records have different spellings, the case-insensitive comparison
- determines the order of the records. When "lexical" fields are
- spelled the same, the case-sensitive comparison determines the order
- of the records.
-
- Lexical fields are defined, as indicated above, by placing the letter
- 'L' immediately following the slant-bar (/) in <key_spec> definitions.
-
- Lexical sorting can be very useful when needed, but be aware that
- unnecessarily specifying lexical ordering may degrade performance of
- QSORT.
-
-
- --- Examples ---
-
- Produce a sorted directory listing and display it on the console a
- screen's worth at a time:
-
- A>DIR | QSORT | MORE
-
- This demonstrates the use of QSORT as a "filter" in a "pipe." Produce
- a directory listing sorted by creation date and time, and display it
- on the console a screen's worth at a time:
-
- A>DIR | QSORT /30:2 /24:5 /39 /34:5 | MORE
-
- The output of the DIR command is piped to QSORT. The key fields
- defined are, from left to right (major to minor), year (2 digits),
- month and day, AM/PM flag and time. The output of QSORT is then piped
- to MORE for display.
-
- Next, replace the unsorted FILE.TXT with the same data sorted in
- reverse order. Use columns 10 to 16 as the sort key:
-
- A>QSORT FILE.TXT /-10:7
- or
- A>QSORT FILE.TXT /10:7 /R
- or (SORT compatible)
- A>QSORT FILE.TXT /R /+10
-
- Next, perform a simple sort on a file with up to 240-byte records
-
- A>QSORT LARGE.REC /M240
- or
- A>QSORT LARGE.REC
-
- Note that the "/M240" parameter is no longer needed, but will not
- hurt.
-
- GLOSS.TXT is an unsorted glossary of terms. The term being defined by
- each entry appears first, followed by several lines of definition.
- The entries are separated by empty lines. Produce GLOSS.SRT, a sorted
- version of the glossary:
-
- A>QSORT /T <GLOSS.TXT >GLOSS.SRT (with redirection)
- or
- A>QSORT /T GLOSS.TXT GLOSS.SRT (without redirection)
-
- A lawyer keeps a running log of his billable activities in TIME.LOG.
- The first line of each entry is "mm/dd/yy hh:mm account#." He always
- places a tilde (~) in the last line of each entry. He wishes to sort
- the log by account number, and by ascending date and time within each
- account:
-
- QSORT /16:7 /7:2 /1:5 /10:5 /T~ TIME.LOG
-
- The directory of users for a bulletin board system is kept in a binary
- file of fixed-length records 180 bytes long. The user name is a
- 26-character field beginning in the first position and the city/state
- field is a 16-character field beginning in the fortieth position.
- Sort the file by city/state and name.
-
- QSORT /F180 /40:16 /1:26 USER.BBS
-
- --- Implementation Notes ---
-
- QSORT is intended as an enhanced replacement for DOS SORT. It is
- nearly fully upward compatible, but provides much more flexibility.
- Multiple sort keys may be specified, a pseudo in-place sort may be
- performed and files and/or records of any size may be sorted provided
- only that there is sufficient disk space for work files and the output
- file. QSORT uses the "quick sort" algorithm, which cannot guarantee
- the order of records whose keys are all equal. This is the one
- "incompatibility" with DOS SORT, which retains the original order of
- records when its only key compares equal. This is important to SORT
- because it must be invoked multiple times to effect a multiple key
- sort. With QSORT, you only sort once and there are usually enough
- keys available to insure you get the order you want the first time.
-
- QSORT uses a sort buffer of about 50K bytes and will fill the buffer
- as full as possible, and then sort the contents of the buffer. If it
- has reached the end of the input file and has not yet generated any
- work files, the contents of the buffer are written to the output file,
- completing the sort operation.
-
- If the input file is too large to fit into the sort buffer, as much of
- the input file as possible is read into the buffer, sorted, then
- written to a temporary work file. This process is repeated as many
- times as necessary to process the entire input file, each time
- creating a new work file for the sorted output.
-
- Upon completion of the "sort phase," QSORT begins a "merge phase."
- Each work file is a sorted sub-set of the input file. Thus, work
- files may be read sequentially and combined to produce a sorted
- output. QSORT will open as many work files as DOS permits (more on
- this later). If all the remaining work files can be opened, the
- sorted result is written to the output file. Otherwise, a new work
- file is created and another merge pass will be required. On each
- merge pass, the number of work files is reduced and eventually all
- remaining work files will be opened and the sorted output file will be
- written completing the sort operation.
-
- QSORT is smart enough to never have just one work file remaining,
- which would require an unnecessary copy operation. In fact, QSORT is
- smarter than just that in its handling of the merge phase. If more
- than one merge pass is required, all the data merged during the first
- pass will have to be merged again, so QSORT attempts to minimize the
- first pass. For example, if QSORT discovers it may only open 15 files
- at a time, and there are 16 temporary files, it will only merge two
- files on the first pass, creating a 17th file as it does. Then in the
- second pass, it will merge all 15 remaining files to the output file.
- The less data it processes twice, the faster it performs the sort!
-
- With nothing else to guide it, QSORT places its temporary files in the
- default directory. Either of two "environment variables" can override
- this. (See your DOS manual for information on environment variables
- and the SET command.) The command:
-
- SET QSTMP=<path> or
- SET TMP=<path>
-
- will define a path for QSORT to use for its temporaries. QSORT first
- looks for the environment variable QSTMP. If it does not exist, QSORT
- next looks for TMP. TMP is a de facto standard used by many programs,
- and is usually defined in your AUTOEXEC.BAT batch file. You might
- have TMP specifying a 64K RAM disk to speed up your compiler. In this
- case, an attempt to sort a 100K file is doomed to failure. Rather
- than redefine TMP, you may define QSTMP to force QSORT to use some
- directory on your hard disk. In fact:
-
- SET QSTMP=.
-
- tells QSORT to always use the default directory anyway!
-
- QSORT, to work properly, needs enough space on the output disk to hold
- the output file. Even if the input file is to be deleted and resides
- in the same directory, that is not done until after the output file
- has been successfully written. If one merge pass is required, the
- disk QSORT uses for temporaries should have about 10% more space than
- the size of the input file for temporary merge files. If more than
- one merge pass will be required, allow about twice the size of the
- input file as temporary merge file space.
-
- One of the advantages of controlling where QSORT places its temporary
- files is to insure adequate space for them. A second is speed. If
- the temporary files can be placed on a separate disk from the input
- and output files, disk seeking is minimized and performance improved.
-
- Each time QSORT must create a new temporary merge file, the data put
- into it will be processed again. Obviously, the more files QSORT can
- open during the merge phase, the fewer times it will have to handle
- each record and the faster it can sort large files. If DOS is
- properly pre-conditioned, QSORT can have up to 15 temporary merge
- files open at once, and very large files can be sorted with just one
- sort pass and one merge pass. Unfortunately, that capability is not
- automatic.
-
- DOS has a fixed number of file "handles" that it associates with open
- files. The default number is eight, but DOS opens five of them for
- standard input, standard output, standard error, standard printer and
- standard auxiliary device. That leaves three for merging. A 250K
- input file would produce five temporary merge files and that would
- take three merge passes; merge two into one, leaving four; merge two
- into one leaving three; and finally merge three into the output file.
- In the process, QSORT must read and write about 80% of the file twice
- during the merge phase.
-
- Worse yet, since you need at least three handles for merging, if you
- have resident programs that have open files, you can't merge at all!
-
- DOS can be told to set aside more space for file handles. Each handle
- is only 39 bytes and it's memory very well spent. One process can
- have a maximum of 20 handles open at one time, but since resident
- processes may be using handles, I recommend 25 to 35. To do this, the
- root directory of the DISK OR DISKS YOU BOOT FROM must contain a file
- named CONFIG.SYS. If your boot disk(s) already contains a CONFIG.SYS,
- edit it, or if not, create it to contain the following line:
-
- FILES=25 (or more)
-
- While we're at it, let's add one more thing to CONFIG.SYS which will
- improve the performance of QSORT and many other programs as well. DOS
- provides, by default, two disk buffers. These are the buffers it uses
- to do its disk reads and writes. During the merge phase QSORT may
- have many files open at once, reading from them in more or less random
- order. DOS may have to read the same physical sector several times to
- get all its data. But DOS can remember what's in each buffer and
- where it came from, and will not re-read a sector it already has in a
- buffer. DOS needs 528 bytes for each buffer. I recommend 20 buffers
- to make QSORT perform well under the most adverse conditions. This
- will require an additional 9504 bytes or slightly more than 9K, again
- memory well spent, so we add to CONFIG.SYS the following line:
-
- BUFFERS=20
-
- See your DOS manual for more information on CONFIG.SYS.
-
-
- I hope you find this program useful. Your comments and suggestions
- are welcome. My address is at the top of this file.